Introduction

This file outlines the sampling strategy for the Punjab province of Pakistan.

General Notes on Sampling

The survey is a combined effort of the Early Learning Partnership project and of the Global Education Policy Dashboard project.

Overall, we draw a sample of 200 public schools, 200 private schools and 200 public-private partnership (PPP) schools. We stratified by urban/rural.

At this stage it is important to note, that there are certain districts which we may not be able to visit due to security concerns, these are:

  • Mianwali
  • Dera Ghazi Khan (DG Khan)
  • Rajan Pur
  • Bhakkar

We have removed these districts from the sampling frame.

Out of the 200 public schools to be surveyed we would like approximately 100 of these schools to be schools that are meeting ECE quality standards (in the data set this corresponds to public_strata==1). Out of the remaining public schools to be sampled, 50 schools will be schools that have ECE but do not meet quality standards (public_strata==2) and 50 will be schools that have no ECE at all, and have only have katchi classes (public_strata==3).

Particular Notes

Due to operational constraints, we did not draw a random sample of all schools at province level. We selected six districts for the survey (out of 32). The survey team drew a convenience sample of 6 districts that is representative of North, Central and South Punjab, which includes both richer and poorer districts. A convenience sample was appropriate due to security and operational constraints of working in Punjab. The selected districts were:

  • Attock
  • Faisalabad
  • Lahore
  • Muzaffargarh
  • Rahimyar Khan
  • Sargodha

In order to deal with potential refusals and closed schools, a set of replacement schools was also drawn. Within the final strata, schools were sampled proportional to size (number of total enrolled children in pre-primary).

Sampling Frame

Our sampling frame then consists of public, private, and PPP schools in these six districts. Additionally, we restricted the frame to schools with at least 10 children enrolled in pre-primary, have at least 3 students in grade 1, and at least 3 students in grades 3, 4, or 5. These latter two restrictions ensured that the schools would contain relevant students and teachers for the Global Education Policy Dashboard survey.

Summary Statistics for Punjab Schools Overall in Sampling Frame

Summary Statistics for Sampling Frame
variable mean sd p0 p25 p50 p75 p100 complete hist
Public Schools
num_teachers 4.5130920 1.6157873 1 4 4.0 5 21 6989 ▇▂▁▁▁
total_enrollment 148.5582807 93.6009456 12 90 127.0 183 1322 7910 ▇▁▁▁▁
total_katchi_enrollment 54.1513129 39.1863885 0 30 46.0 68 589 11539 ▇▁▁▁▁
total_katchi_enrollment_boys 28.4393795 26.3699140 0 11 23.0 38 386 11539 ▇▁▁▁▁
total_katchi_enrollment_girls 25.7119334 28.1878626 0 7 19.0 35 528 11539 ▇▁▁▁▁
total_ece_enrollment 14.7138728 12.3114049 0 6 10.5 20 85 692 ▇▂▁▁▁
total_ece_enrollment_boys 7.0765896 7.7462414 0 2 5.0 10 44 692 ▇▂▁▁▁
total_ece_enrollment_gils 7.6372832 9.6365169 0 1 5.0 10 80 692 ▇▁▁▁▁
total_1st_enrollment 34.8336944 26.4347263 3 18 29.0 43 399 11539 ▇▁▁▁▁
total_1st_enrollment_boys 17.6464165 18.1688823 0 6 13.0 24 217 11539 ▇▁▁▁▁
total_1st_enrollment_girls 17.1872779 21.0141932 0 4 11.0 22 340 11539 ▇▁▁▁▁
rural 0.8569200 0.3501696 0 1 1.0 1 1 11539 ▂▁▁▁▇
Private Schools
num_teachers 14.8044150 13.6221015 2 8 12.0 17 299 4530 ▇▁▁▁▁
total_enrollment 247.1450239 240.4280558 28 123 185.0 290 5641 7516 ▇▁▁▁▁
total_pre_primary_enrollment 79.6370410 58.7167132 10 44 66.0 98 1026 7516 ▇▁▁▁▁
total_pre_primary_enrollment_boys 46.9404561 34.5781981 4 27 39.0 57 568 4736 ▇▁▁▁▁
total_pre_primary_enrollment_girls 41.8552324 30.1801712 4 23 35.0 51 458 4711 ▇▁▁▁▁
total_1st_enrollment 28.8436203 25.9622252 3 14 22.0 35 502 7469 ▇▁▁▁▁
total_1st_enrollment_boys 15.3945859 14.4065935 1 7 12.0 19 301 7499 ▇▁▁▁▁
total_1st_enrollment_girls 13.4231900 12.9099824 1 6 10.0 16 232 7486 ▇▁▁▁▁
rural 0.4717935 0.4992370 0 0 0.0 1 1 7516 ▇▁▁▁▇

Sampling of Schools in Districts

Schools (PSUs) will be selected using the Probability Proportional to Size (PPS) sampling method, where size is based on the total pre-primary enrollment of the schools. This method allows schools with larger enrollment of to have a higher chance of being selected in the sample. It is most useful when the sampling units vary considerably in size because it assures that those in larger sites have the same probability of getting into the sample as those in smaller sites, and vice versa.

Public Schools

Out of the 200 public schools to be surveyed we would like approximately 100 of these schools to be schools that are meeting ECE quality standards (in the data set this corresponds to public_strata==1). Out of the remaining public schools to be sampled, 50 schools will be schools that have ECE but do not meet quality standards (public_strata==2) and 50 will be schools that have no ECE at all, and have only have katchi classes (public_strata==3).

Around 86% of schools in the public school sampling frame are classified as rural. We do stratification by urban rural status, and over-sample urban schools, so that we have adequate power to detect differences. This results in a sample of around 64 urban schools and 136 rural schools.

Units are allocated to our districts proportionate to the size of the pre-primary enrollment in the district.

## Adding missing grouping variables: `public_strata`

Private Schools

Out of the 200 private schools to be surveyed we will stratify by district and by the urban/rural status of the school. Roughly 50% of the sample will be urban, compared to 47% in the full private sampling frame.

Replacement Schools

Below is a list of replacement schools for each sampled school. Replacement schools were randomly selected among the set of schools in the tehsil of the same urban/rural status, not including the orginally sampled schools. Each row contains the school name, location, and other information for each replacement school. In the final 5 columns of the database is the school code, school name, district, and tehsil of the originally sampled school for which this school serves as a replacement.

For access reasons, we had to reallocate 46 schools from the Lahore Cantt district to our other districts. This is just for private schools.

# select one replacement per district
sample_noncantt_private <- data_set_updated_private %>%
    mutate(
    rural_share=case_when(
      urban_rural==1 ~ 0.5,
      urban_rural==2 ~ 0.5,
      TRUE ~ 0)) %>%
  group_by(DistrictName) %>%
  mutate(total_ece=sum(total_pre_primary, na.rm=T)) %>%
  ungroup() %>%
  mutate(district_share=total_ece/sum(total_pre_primary, na.rm=T)) %>%
  left_join(sampled_districts_private) %>%
  filter(!is.na(sampled_districts)) %>%
  filter(TehsilName!="LAHORE CANTT.") %>%
  filter(is.na(sample)) %>%
  group_by(DistrictName, urban_rural) %>% 
  mutate(school_number=round(60*district_share*rural_share,0)) %>%
  sample_n(school_number, weight=total_pre_primary) %>%   #select 46 replacement schools
    ungroup() %>% # fix an issue where two extra schools are selected.  Had to do a trick to get exactly 200
  sample_n(49, weight=total_pre_primary) 
## Joining, by = c("DistrictName", "TehsilName", "urban_rural")
write_excel_csv(sample_noncantt_private,  paste(dir_frame, '/sample_noncantt_schools_private_', Sys.Date(),  '.csv', sep=""))

sample_private_update <- sample_private %>%
 filter(TehsilName!="LAHORE CANTT.") %>%
  bind_rows(sample_noncantt_private)
## Warning in bind_rows_(x, .id): Vectorizing 'haven_labelled' elements may not
## preserve their attributes

## Warning in bind_rows_(x, .id): Vectorizing 'haven_labelled' elements may not
## preserve their attributes
write_excel_csv(sample_private_update,  paste(dir_frame, '/sample_schools_private_', Sys.Date(),  '.csv', sep=""))

Summary Statistics of Sample

Summary Statistics for Sample of Schools
variable mean sd p0 p25 p50 p75 p100 complete hist
Public Schools
num_teachers 5.569767 2.3888993 2 4.00 5.0 6.75 19 86 ▇▃▁▁▁
total_enrollment 229.777778 142.4427169 53 126.25 206.5 270.00 885 90 ▇▅▁▁▁
total_katchi_enrollment 88.830000 56.5770795 8 51.75 74.0 106.50 343 200 ▇▅▂▁▁
total_katchi_enrollment_boys 43.020000 37.5032273 0 16.00 37.0 62.00 288 200 ▇▃▁▁▁
total_katchi_enrollment_girls 45.810000 47.5013240 0 14.75 33.5 66.00 310 200 ▇▂▁▁▁
total_ece_enrollment 16.926316 12.0964582 0 7.50 15.0 24.00 51 95 ▇▅▃▂▁
total_ece_enrollment_boys 7.378947 8.5703335 0 2.00 5.0 10.50 44 95 ▇▃▁▁▁
total_ece_enrollment_gils 9.547368 10.8075707 0 2.00 6.0 10.50 49 95 ▇▂▁▁▁
total_1st_enrollment 60.565000 49.6105007 7 32.00 47.0 68.50 399 200 ▇▂▁▁▁
total_1st_enrollment_boys 27.685000 30.1176216 0 8.00 20.0 39.00 198 200 ▇▂▁▁▁
total_1st_enrollment_girls 32.880000 39.6045224 0 7.00 21.5 46.00 340 200 ▇▁▁▁▁
rural 0.675000 0.4695502 0 0.00 1.0 1.00 1 200 ▃▁▁▁▇
Private Schools
num_teachers 17.974576 15.8105568 4 10.00 13.0 20.00 120 118 ▇▁▁▁▁
total_enrollment 332.005000 297.5330985 54 169.75 231.5 377.00 2618 200 ▇▁▁▁▁
total_pre_primary_enrollment 109.305000 78.3199980 14 59.00 86.5 125.75 468 200 ▇▃▁▁▁
total_pre_primary_enrollment_boys 64.046512 47.1794878 15 34.00 52.0 74.00 261 129 ▇▃▁▁▁
total_pre_primary_enrollment_girls 53.775194 36.4355682 10 32.00 43.0 60.00 207 129 ▇▃▁▁▁
total_1st_enrollment 38.412060 32.1288607 5 19.50 28.0 45.50 219 199 ▇▂▁▁▁
total_1st_enrollment_boys 21.100000 18.5407334 2 10.00 15.5 23.25 128 200 ▇▂▁▁▁
total_1st_enrollment_girls 17.547739 15.0560060 2 8.00 12.0 20.00 91 199 ▇▂▁▁▁
rural 0.470000 0.5003516 0 0.00 0.0 1.00 1 200 ▇▁▁▁▇

Map of Selected Schools